Your browser doesn't support the features required by impress.js, so you are presented with a simplified version of this presentation.

For the best experience please use the latest Chrome, Safari or Firefox browser.

5 kinds of NoSQL

Froscon 2020

Henrik Ingo, Datastax

Any element with the class="notes" will not be displayed. This can be used for speaker notes. In fact, the impressConsole plugin will show it in the speaker console!

RDBMS

NoSQL

 

 

 

 

 

 

 

Key-Value

It's fast because...
  • It's simple
  • It's in RAM
  • It's denormalized
  • Can use hash index
  • Hash based sharding
redis 127.0.0.1:6379> SET name "Henrik"
OK 
redis 127.0.0.1:6379> GET name 
"Henrik"
redis 127.0.0.1:6379> SET age "43"
OK 
redis 127.0.0.1:6379> GET age
"43"

Use cases

Cache. Session cache.

In-memory, low latency computing. (Write heavy.)

Recommendation engines & Machine Learning.

Queue

One more thing...

Redis complex data types: lists, sets, maps and streams.

Wide-Column

What does it do???
  • Tables, rows & columns.
  • All data access uses Primary Key
  • PK can be composite:
    Partition Key + Clustering Keys
  • Partition Key is required
CREATE TABLE people (id UUID PRIMARY KEY, firstname text, lastname text);

INSERT INTO people (id, lastname, firstname) 
    VALUES (5b6962dd-3f90-4c93-8f61-eabfa4a803e2, 'Ingo','Henrik');

SELECT lastname, firstname FROM people WHERE id = 5b6962dd-3f90-4c93-8f61-eabfa4a803e2;

Use cases

Large (aka Web Scale) +100TB databases

Write optimized storage engine

Write availability (Dynamo HA)

One more thing...

Useful secondary indexes: See Cassandra 4.0 and DataStax Enterprise 6.8.3.

Document

What does it do???
  • Records are JSON or XML
  • Flexible schema:
    Structure but not fixed
  • Secondary indexes, complex queries, transactions
> db.somecollection.insert({firstname: "Henrik", lastname: "Ingo", age: 42})
> db.somecollection.createIndex({lastname:1, firstname:1});

> db.somecollection.find({lastname: "Henrik"})
{_id: ObjectId("507f1f77bcf86cd799439011"), firstname: "Henrik", 
lastname: "Ingo", age: 42}

Use cases

General purpose database. Competes with RDBMS.

Main selling points compared to relational:
JSON API, flexible schema, sharding.

Flexible schema strengths: Data hub.

What does the future look like...

Incremental innovation? Performance, GUI tools, integrations, SDKs...

Graph

What does it do???
  • Records are nodes, connected by edges
  • Both can have properties
  • Indexes enable queries
gremlin> graph = TinkerFactory.createModern()
==> tinkergraph[vertices:6 edges:6]
gremlin> g = graph.traversal()
==> graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
gremlin> g.V().has('name','marko').out('knows').values('name')
==> 'vadas'
==> 'josh'

Use cases

Analytical. Find friends of friends that own a cat

Social media, recommendation engines, etc.

National security

One more thing...

Gremlin, Cypher, GraphQL

OLTP graph databases exist. (Datastax)

Interesting unsolved problem: Optimal sharding for graph DBs.

Query Engine

What does it do???
  • Formerly known as Hadoop
  • "Batch" queries
  • But also interactive
  • Data stored in HDFS, S3, Cassandra, MongoDB...
import org.apache.spark.sql.SparkSession
val spark = SparkSession.builder().appName("Froscon demo").getOrCreate()
import spark.implicits._
val df = spark.read.json("people.json")
df.createOrReplaceTempView("people")
val sqlDF = spark.sql("SELECT * FROM people")
sqlDF.show()
+-----------+----------+-----+
| firstname | lastname | age |
+-----------+----------+-----+
| Henrik    | Ingo     | 43  |
+-----------+----------+-----+

Use cases

Data lake. S3.

Personalized user profile

Fraud detection, national security...

"Reporting"

One more thing...

Spark Streaming (mini-batch)

AWS Athena = Presto

> curl -POST http://localhost:9200/froscon/people/id1 -curl 
 -H 'Content-Type: application/json' -d '{"name":"Henrik Ingo"}'

> curl -XGET localhost:9200/froscon/_search?q=name:Ingo

[{_index: "froscon", _type: "people", _id: id1, _source:
{_id: "name":"Henrik Ingo"}

Use cases

Google for your website

Queries beyond the typical RDBMS BTree

Kibana analytics

Security monitoring

One more thing...

Elastic = MongoDB in size

Until 2018

Apache/BSD*GPLopen coreproprietary
Key-ValueMemcacheRedis
Wide ColCassandraBigTable, DynamoDB
DocumentMongoDBMarkLogic
GraphNeo4jDSE Graph
Query EngSpark, PrestoAthena
SearchLucene, SolrElastic

After 2018

Apache/BSD*GPLopen coreproprietary
Key-ValueMemcacheRedisRedis
Wide ColCassandraBigTable, DynamoDB
DocumentMongoDB
GraphNeo4j
Query EngSpark, PrestoAthena
SearchLucene, Solr,
Open Distro
Elastic

Image credits:

jay~dee @ Flickr